Skip to content

fix: ignore callbacks that arrive after the function stopped waiting#5056

Merged
jcs090218 merged 1 commit into
emacs-lsp:masterfrom
alberti42:fix/lsp-request-while-no-input
May 12, 2026
Merged

fix: ignore callbacks that arrive after the function stopped waiting#5056
jcs090218 merged 1 commit into
emacs-lsp:masterfrom
alberti42:fix/lsp-request-while-no-input

Conversation

@alberti42
Copy link
Copy Markdown
Contributor

@alberti42 alberti42 commented May 3, 2026

Summary

lsp-request-while-no-input sends an asynchronous request and then waits
for the response inside a synchronous loop. The success and error callbacks
end with (throw 'lsp-done '_) to break out of that loop. If the response
arrives after the function has already given up and returned — for
example because the user typed during the wait, or the timeout fired —
there is no longer a (catch 'lsp-done …) on the call stack to receive
the throw. The throw then escapes all the way to the top level and shows
up as a stray error in some unrelated piece of code.

This PR fixes the race by giving the two callbacks a flag in the function's
local scope. The cleanup code in unwind-protect clears that flag before
it asks for cancellation. Any callback that fires later sees the flag as
nil and quietly does nothing instead of trying to throw to a destination
that no longer exists.

No public API changes. The normal path is unchanged. The only difference
a user can observe is that a class of confusing stray errors goes away.

Background

Two pieces of context that the rest of the PR depends on.

catch / throw in Emacs Lisp

catch and throw are Emacs Lisp's way of jumping out of a deeply nested
piece of code without going through the normal return path. They are
not the same as condition-case / signal, which are used for error
handling. A (throw 'TAG VAL) walks up the call stack looking for a
matching (catch 'TAG …) and jumps back to it, discarding all the
intermediate function calls along the way.

The property that matters for this bug: throw only works if the matching
catch is currently on the call stack. If no such catch is in progress
when throw runs, Emacs raises a no-catch error in whatever code
happens to be running at that moment. There is no notion of a "future"
or "already-completed" catch — there is only "active right now" or
"nothing there."

What lsp-request-while-no-input does

It is a synchronous wrapper around an asynchronous request, with one extra
property: if the user types while it is waiting, the wait is given up and
the function returns. The function's shape:

  1. Send the request asynchronously, registering success and error callbacks.
    Each callback stores the result in a local variable and then calls
    (throw 'lsp-done '_).
  2. Wait for the response inside a while loop that calls sit-for,
    wrapped in (catch 'lsp-done …). The throw from a callback is what
    breaks out of the loop.
  3. The same loop also exits when input-pending-p becomes true (the user
    typed something).
  4. An unwind-protect cleanup runs on exit and asks the server to cancel
    the request by calling lsp-cancel-request-by-token, in case no
    response arrived in time.

The bug

Cancellation is not guaranteed. lsp-cancel-request-by-token only asks
the server to cancel; it cannot stop a response that is already on its way
back. Any of the following sequences leaves the callbacks alive after the
function has stopped waiting:

  • The request has already been sent to the server by the time the user
    typed. The server will reply regardless.
  • The cancellation runs, but Emacs has already received the response from
    the server and put it in the queue of messages to be processed.
  • The server takes long enough to handle the cancellation that another
    response gets back to us first.

When the response then arrives, lsp-mode's usual machinery invokes the
registered callback. The callback runs:

(lambda (res) (setf resp-result (or res :finished)) (throw 'lsp-done '_))

resp-result is a leftover variable (the surrounding let has already
ended, so the setf writes to a binding that nothing reads any more —
harmless). The throw is the real problem: there is no (catch 'lsp-done …)
on the call stack any more. Emacs raises a no-catch error in whatever
piece of code happens to be running when the callback fires — an idle
timer, an unrelated command, the next call into lsp-mode, or Emacs's main
event loop. The error usually surfaces as:

no-catch: lsp-done '_

with a backtrace that has nothing to do with the original
lsp-request-while-no-input call, because the function it came from has
long since returned.

The race shows up in practice because lsp-request-while-no-input is on
a high-traffic path: it is called from lsp-completion.el and
lsp-inline-completion.el, both of which drive auto-completion UIs like
Corfu — often once per keystroke. Fast typing means many overlapping
requests, each one setting up its own (catch 'lsp-done …) and then
tearing it down again. Sometimes a response arrives in the gap between
teardown and the next setup, and that is when the bug fires.

The fix

Add a flag in the function's local scope that both callbacks check before
throwing, and clear that flag from the unwind-protect cleanup before
asking for cancellation:

(let* (resp-result resp-error done?
       (catch-active t))
  (unwind-protect
      (progn
        (lsp-request-async method params
                           (lambda (res)
                             (when catch-active
                               (setf resp-result (or res :finished))
                               (throw 'lsp-done '_)))
                           :error-handler (lambda (err)
                                            (when catch-active
                                              (setf resp-error err)
                                              (throw 'lsp-done '_)))
                           …)
        … synchronous wait loop …)
    (setq catch-active nil)            ; ← clear before cancel
    (unless done?
      (lsp-cancel-request-by-token :sync-request))
    …))

How it works:

  • While the loop is still running, catch-active is t and the callbacks
    behave the same as before — they store the result and throw.
  • The unwind-protect cleanup runs whenever the function exits, by any
    path. The first thing it does is (setq catch-active nil). From that
    moment on, any callback that fires sees the flag as nil, skips the
    setf and the throw, and returns silently.
  • A plain setq on the local variable is enough because the callbacks
    refer to the same lexical binding (lsp-mode is compiled with
    lexical-binding: t).
  • Order matters: the flag is cleared before
    lsp-cancel-request-by-token is called, so even if cancellation
    somehow runs a callback synchronously, the flag is already off and the
    callback is harmless.

This is the smallest change that closes the race. It does not need to
coordinate with the response queue, and it does not change anything about
how cancellation works.

Why this is preferable to the obvious alternatives

A few alternatives one might reach for, and why they are judged to be worse:

  • Wrap the throws in a fresh catch inside the callbacks. This would
    add unused catch blocks to other code paths and silently throw away the
    response value; misleading and unnecessary.
  • Make lsp-cancel-request-by-token synchronous. Far more invasive
    and changes how cancellation works elsewhere; the fact that cancellation
    is asynchronous and not guaranteed is intentional.
  • Remove the callback from the request-tracking table during cleanup.
    lsp-mode's bookkeeping removes the entry when the response arrives, not
    when the request is cancelled. Removing it from this call site would
    mean reaching into private bookkeeping that other code expects to own.
  • Wrap the call in (condition-case nil … (no-catch nil)) to swallow
    the error.
    This would hide the symptom but leave the real problem in
    place: a callback that still thinks it has somewhere to throw to. It
    would also lose the return value if the throw happened to land inside
    the condition-case.

The flag is local, costs essentially nothing, and correctly captures the
rule: "the destination of the throw only exists between sending the
request and the function returning."

Compatibility

  • No API change. lsp-request-while-no-input keeps the same
    signature, return value, and behaviour on the normal path.
  • No behaviour change when no race occurs. If the response arrives
    while the function is still waiting (the common case), catch-active
    is still t when the callback runs and the throw goes through exactly
    as before.
  • Lexical binding is already required by lsp-mode (the file header has
    lexical-binding: t), so the local flag captured by the callbacks
    works as written.
  • lsp--throw-on-input is unaffected. The cleanup still throws
    'input when the user has typed. Only the 'lsp-done throw coming
    from a late response callback is suppressed.

Reproducing the bug before the patch

The race depends on timing, but it is reliably triggered when the
function is called frequently. A practical recipe:

  • Use a server that takes a noticeable amount of time to answer completion
    requests (for example ltex-ls-plus against a moderately sized buffer,
    or any server that does real work for textDocument/completion).
  • Enable lsp-completion-enable and use Corfu with corfu-auto t and a
    short corfu-auto-delay (e.g. 0.1).
  • Type continuously for several seconds. Each keystroke triggers a
    lsp-request-while-no-input call; the next keystroke gives up on the
    previous one before it has finished.
  • Watch *Messages* for entries of the form no-catch: lsp-done '_.
    They appear from time to time, attributed to whatever command happens
    to be running when the late response arrives.

With the patch applied, those messages stop appearing, and the completion
behaviour is otherwise unchanged.

Origin

This issue surfaced while debugging lsp-ltex-plus (an lsp-mode client
for the ltex-ls-plus grammar/spell server). ltex-ls-plus runs slow
checks alongside frequent server-initiated traffic, which made the bug
easy to trigger from interactive completion. To unblock users while a
proper fix is discussed upstream, lsp-ltex-plus has been shipping the
patch as a local override of lsp-request-while-no-input since release
v0.3.3 (https://github.com/alberti42/emacs-ltex-plus/releases). This
is mentioned only as context — the bug is in lsp-mode regardless of
which client is connected, and the fix belongs upstream so other clients
benefit without each one having to ship its own override.

Related work

This is the second of three independent fixes to how lsp-mode handles
messages when there are many requests waiting for responses at the same
time. The other two are:

  • PR 1 — make lsp--parser-on-message classify messages by their
    method field before their id. Today, when a server-initiated
    request happens to use an id that matches a pending client request,
    it is mis-routed as a response. PR 1 fixes that.
  • PR 3 — make lsp--create-filter-function keep processing the rest
    of a batch even if one handler throws. Today, a single handler that
    throws can abandon the other messages that were parsed in the same
    batch. PR 3 catches errors around each message instead of around the
    whole batch.

Each PR addresses a separate problem and is independently useful; they
are split apart to keep review focused. PRs 1 and 3 do not depend on this
one — applying any subset of the three is safe.

The success and error callbacks unconditionally `(throw 'lsp-done '_)' to
break the synchronous busy-loop's `(catch 'lsp-done ...)'. If the response
arrives after the function has unwound (cancellation is best-effort), the
catch is no longer on the stack and the throw escapes to the top level.

Capture a closed-over `catch-active' flag in the callbacks; the
`unwind-protect' cleanup invalidates the flag before requesting cancel,
so any callback fired by an already-queued response after unwind becomes
a no-op.
@alberti42 alberti42 force-pushed the fix/lsp-request-while-no-input branch from 44193be to 7a1bd53 Compare May 11, 2026 16:34
@jcs090218 jcs090218 merged commit e5cdc6c into emacs-lsp:master May 12, 2026
11 of 14 checks passed
@jcs090218
Copy link
Copy Markdown
Member

Thank you!

@alberti42
Copy link
Copy Markdown
Contributor Author

Thanks to you for the very prompt review!!

@alberti42 alberti42 deleted the fix/lsp-request-while-no-input branch May 13, 2026 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants